Spark: Empty RDD

In Spark, using emptyRDD() function on the SparkContext object creates an empty RDD with no partitions or elements. The below examples create an empty RDD.

Create empty RDD

import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
    .master("local[3]")
    .appName("RDD")
    .getOrCreate()
val rdd = spark.sparkContext.emptyRDD
val rddString = spark.sparkContext.emptyRDD[String]
println(rdd)
println(rddString)
println("Num of Partitions: "+rdd.getNumPartitions)

Create empty RDD with Partition

Using Spark sc.parallelize() we can create an empty RDD with partitions, writing partitioned RDD to a file results in the creation of multiple part files.

val rdd= spark.sparkContext.parallelize(Seq.empty[String])
println(rdd)
println("Num of Partitions: "+rdd.getNumPartitions)

Create empty pair RDD

Most we use RDD with pair hence, here is another example of creating an RDD with pair.

This example creates an empty RDD with String & Int pair.

type pairRDD = (String,Int)
var resultRDD = sparkContext.emptyRDD[pairRDD]

Spark

Empty RDD

No comments:

Post a Comment